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-Abstract- 

LogEx is a learning environment that supports students in rewriting propositional logical formu¬ 
lae, using standard equivalences. We organized a pilot study to prepare a large scale evaluation 
of the learning environment. In this paper we describe this study, together with the outcomes, 
which teach us valuable lessons for the large scale evaluation. 
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[T] Introduction 

Students learning propositional logic practice by solving different kinds of exercises. Many of 
these exercises are solved stepwise. To support a student solving such exercises an intelligent 
tutoring system can be very effective [9]. At the Open University of the Netherlands we 
are developing a learning environment (LE) LogE:x[^ which supports students in rewriting 
propositional logical formulae, using standard equivalences. We intend to evaluate this LE 
with a large group of students later this year, and to prepare this evaluation we organized a 
small scale evaluation in December 2014 [6]. Detailed loggings of the learning environment 
offer the possibility to analyze the way students use our LE. Eor example, in an earlier 
study [5] loggings of students working on normalizing propositional logic expressions were 
used to construct a probability model of the correctness of the use of rules. In the large 
scale study we want to perform, our main focus is the question whether or not a student 
learns by using our LE. We will compare different versions of LogEx that have more (or 
fewer) feedback services. In the pilot study our main questions were: (1) do students learn 
by using the LE, and (2) what lessons can we learn for a large scale evaluation. 

This paper is organized as follows. In the next two sections we describe the LogEx 
learning environment, and the experiment we performed with it. Section 4 summarizes the 
results of the assessment tests and loggings. We conclude with lessons learned from the 
experiment. 

[T] The LogEx learning environment 

LogEx is a learning environment (LE) in which a student practices rewriting propositional 
logical formulae, using standard equivalences. The LE contains three kinds of exercises: 
rewriting a formula in DNE, in CNE, and proving the equivalence of two formulae. A 
student enters her solution stepwise. In exercises on equivalence proofs she has to motivate 
each step with a rule name; in exercises on rewriting to normal form this motivation is 


^ http://ideas.cs.uu.nl/logex/ 
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■ Figure 1 Screenshot of LogEx. 


optional. When proving two formulae equivalent, a student can work both forwards and 
backwards. Figure shows an example of a partial proof consisting of two backward steps. 
A student can change the direction in which she is working at any moment. 

In the version we used in the pilot, correction per step is turned on. This implies that a 
student receives feedback after each step. Feedback concerns syntax errors, such as missing 
parentheses, or rule feedback. After a student has entered a formula, the LE will try to 
recognize the rule that was used. If the LE can detect a rule, it will compare this rule 
with the rule name provided by the student, and give an error message if the wrong rule 
name was given. If no rule is detected, the LE checks semantically whether or not the new 
and old formulae are equivalent. If they are not equivalent, the LE uses a set of common 
mistakes, also called buggy rules, to try to give informative feedback. For example, if a 
student rewrites |not (p q) (not2 p / not q) not q| into |(not p not q) (not2 p / not 
q) not q|, then the LE reports that this step is incorrect, and mentions that when applying 
DeMorgan’s rule, a disjunction is transformed into a conjunction. 

A student can ask for a hint (e.g. perform a backward step or apply DeMorgan), a next 
step, or a complete worked out solution, at any moment. The LE contains solution strategies 
to calculate this feed forward. A student can choose between exercises of different difficulty 
levels, or enter her own exercises. Feedback and feed forward are available for all exercises. 
LogEx integrates improved versions of earlier tools to rewrite formulas in disjunctive normal 
form nil] and to prove equivalences [2] . The main learning goals for which we want to use 
the LE are: After practicing with the LE a student 

H can recognize applicable rules 
H can apply rules correctly 
H can rewrite a formula in normal form 

H can prove the equivalence of two formulae using standard equivalences 

H can demonstrate strategic insight in how to rewrite a formula in normal form or prove 
an equivalence in an efficient way. 
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[T] The experiment 

In the last decade we performed a number of small scale experiments with earlier versions 
of our LE. Participants in these experiments were students of the Open University of the 
Netherlands (OUNL). Although we learned quite a lot from these experiments, they had 
two limitations. First, since OUNL is a distance university, students evaluated the LE at 
home. This meant that we could not observe students using the LE, and there was little 
control on the way how students worked with the LE. Second, the students of OUNL are 
rather heterogeneous. 

We performed a new experiment with our LE with a group of first-year students of 
Utrecht University. This experiment had two goals: 

H evaluate the use of the LE: do students achieve the learning goals as mentioned in Sec¬ 
tion and is the support offered by the LE sufficient to reach these goals. 

H prepare for a large scale evaluation of the LE this year: is the information about the 
LE sufficient to work with it, do students get enough time to practice with the LE, 
do students get enough time to answer the pre-test and post-test and is the logging 
adequate. 

Before the experiment started, we organized a short introduction to the purpose of the 
experiment and we explained the main features of the LE. Then students got ten minutes 
to make a pre-test consisting of three exercises: prove that a formula is a tautology, rewrite 
a formula into normal form, and prove that two formulae are equivalent. They practiced 
75 minutes with the LE, and then made a post-test comparable to the pre-test. We used a 
special version of the LE with a fixed set of exercises: five on rewriting a formula in DNF, 
five in CNF, and five on proving an equivalence. We logged all interactions of the students 
with the LE. During the pre-test and post-test, students could make use of a paper sheet 
with the list of standard equivalences that were allowed to solve the problems. 

Five students participated in the experiment, all male, age 18-22, from the disciplines 
computer science, information science, and game technology. All students took part in the 
course Logic in Computer Science, and had already worked on the subjects covered in the 
LE in this course, except for proving an equivalence using standard equivalences, which is 
not part of the learning goals of the course. 

[T] Results 

We analyzed the loggings of our LE. Only the first three normal form and proof exercises 
were completed by almost all students, and we only include these exercises in our results. 
Figure shows the number of erroneous steps as a fraction of the total number of correct 
steps performed by the student. The x-axis displays the type and number of the exercise. 
All but one student completed the exercises in the order presented on this axis. Student 1 
completed the exercises on proofs before the exercises on CNF. Moreover, in the first three 
exercises on CNF this student used only the next step button. He did complete the fourth 
CNF exercise without help of the LE, but we did not include this exercise in the figures. 
The figure shows that the students 2, 3, and 5 gradually make less mistakes, and this holds 
also for the last three exercises of student 4. The relatively high number of mistakes in the 
exercises dnf2 and pr2 can be explained by the difficulty of these exercises. Dnf2 asked for 
more complicated applications of the distribution rule. Pr2 had a rather simple solution, 
but students who did not see this solution got rather long formulae. 

We also logged the time that a student needed to complete an exercise. In Figure we 
present the time per correct step in minutes. At the start students still learn how to use the 
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LE, and this likely partially accounts for the decrease of time needed to take a step. Most 
students solved the post-test faster than the pre-test. This indicates that students learn to 
solve the exercises faster by using the LE. The extra amount of time needed to solve prO 
can be explained by the fact that this type of exercise was new for this group of students. 

To evaluate the last learning goal, the development of strategic insight, we compared the 
number of correct steps of the student solution with the number of steps of the example 
solution generated by the LE. Here we took also into account how far a solution was simpli¬ 
fied. Eor example, in the first exercise, some students used three steps to reach the normal 
form |(q / not r) q r|, others used four steps to reach the simplification |q r|. The worked 
out solution also takes three steps to reach |(q / not r) q r| and four steps to reach |q 
r|. Hence, in Eigurej^we score both solutions as 1, namely 3/3 resp. 4/4. The outcome 
suggests that students do not learn to solve the exercises more efficiently. We will discuss 
this outcome later. 

The results of the pre-test and post-test were not very informative. The main positive 
result is that students completed more exercises in the post-test, see Eigurej^ They made 
a few more mistakes in the post-test, but this might be related to the number of steps they 
performed. Also, there was no gain in efficiency. Because of the low number of students we 
could not compare the difficulty of the pre-test and post-test. We think that the post-test 
was slightly more difficult than the pre-test. The large scale evaluation will be designed in 
such a way that we can compensate for different difficulties in pre-test and post-test. 

Lessons learned 

In this section we discuss the consequences of our analyses of the loggings and the tests, 
together with observations we made during the evaluation session. 

5.1 Do students reach the learning goals? 

The first question we posed in Section was: do students reach the learning goals and does 
LogEx sufficiently supports them to reach these goals. 

Recognizing applicable rules 

Students learn to recognize applicable rules, with two exceptions. LogEx admits generaliza¬ 
tions of DeMorgan and distribution rules. Eor example, it is allowed to rewrite |not(p / q / 
r)| in I not p not q not r| in one step. These generalizations were mentioned in the introduc¬ 
tion to the evaluation, but they were not explicitly present in the list of rules. Students did 
not use these generalized rules. A second rule that was hardly used is absorption. This rule 
is not needed to rewrite a formula in normal form, but it can simplify the calculations. Only 
one of the students used this rule by himself, three others only after a hint suggested to use 
absorption, and one student did not use this rule at all. We have to think of a possibility 
to make students aware of the usefulness of this rule. A possible solution might be that in 
case absorption is applicable, but a student chooses another rule, LogEx will point out the 
possibility to simplify the formula using absorption. 

Apply rules correctly 

LogEx does provide feedback at the rule level, and we find that this feedback helps to achieve 
the second learning goal. In general the error messages are sufficient for a student to correct 
mistakes. However, this is not always the case. In case a student accidentally rewrites a 
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H Figure 4 Efficiency measured by the number of performed steps as a fraction of the number of 
steps in a worked out solution 
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H Figure 5 Results of pre-test and post-test 
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formula into an equivalent formula while making a mistake, no error specific message is 
given. During the session we were asked several times by a student why his rewriting was 
incorrect. Finally, in the loggings we found some examples where students could not repair 
their mistakes directly in such a situation. In a next version we will also provide error 
specific feedback when the new formula is equivalent to the previous one. 

Analysis of the loggings revealed some missing buggy rules, for example rewriting of |(p 
q) / (not p not q)| into |F|. From the loggings we also learned that error messages for 
syntax errors were not always helpful: students sometimes need several attempts to correct 
a syntax error. 

Rewrite a formula in normal form and prove the equivalence of two formulae 

The loggings and tests indicate that students do learn to rewrite a formula in normal form. 
Students were able to complete the exercises without too much use of the help button, and 
most students could finish the exercises on normal forms in the post-test. Since time in the 
post-test was too short to complete all the exercises, we can only use the loggings to draw 
conclusions about proving equivalence. The loggings indicate that students also learn to 
solve this kind of exercises. 

Demonstrate strategic insight 

The loggings and tests do not show improvement on the last learning goal. A reason might 
be that students had to answer different kinds of exercises, which needed partially different 
strategies. A careful analysis of the loggings shows that this is not the only reason. For 
example, one of the students developed a personal strategy of introducing double negations 
combined with the use of DeMorgan. In most cases this strategy was not effective, but 
since he got no feedback on the use of this strategy, he kept using it, also in the post-test. 
We think that there are at least two reasons why a student does not learn to work more 
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efficiently. The first reason is that LogEx does not provide feedback on the strategic level, 
and hence gives no information about a strategy for solving an exercise. This information 
is given implicitly by hints or next steps, but only one student made use of hints or next 
steps. The possibility to compare a solution with the complete solution was only used by 
one student. Help avoidance is one of the known problems with LEs [a Em]. This might 
be a second reason that the last goal was not met. Although in general students learn more 
when they have to ask for help themselves [5] , in this case it seems necessary that the system 
provides help without being asked. LogEx recognizes when a student solution diverges from 
one of the possible paths of the strategy that we implemented. In a next version we might 
provide a warning in such a case. Alternatively, LogEx might warn a student if a solution is 
getting longer than the worked-out solution. A third possibility is to postpone this warning 
until a student has finished an exercise, but this might cause frustration. 

Other remarks concerning the use of the tool 

To prevent unreadable formulae and endless derivations, the use of associativity is implicit 
in LogEx. This means, for instance, that a student does not have to introduce or change 
parentheses before an application of idempotency in a formula such as |q p p s|. As a 
consequence, LogEx will consider |p q r| and |(p q) r| to be the same formula. There is 
no separate rule available to delete parentheses. In the second DNE exercise most students 
reached the normal form |q (not p q) p|. At this point, the students tried to get rid of the 
parentheses, but LogEx did not accept this. In a next version we will have to introduce the 
possibility to delete parentheses. 

Some other minor points we learned about the LE concern user friendliness. Overall, 
students had no problems with the use of LogEx. However, we observed a student copying 
and pasting a previous formula when he wanted to correct the formula he was editing. He 
had not noticed that the mini-keyboard in the user interface contains an undo button. 

5.2 What lessons can we learn for the large scale evaluation? 

The use of a pilot study is an important principle in the design of evaluation studies [6]. 
Overall, the evaluation went well, but students need more time for the pre-test and post¬ 
test. We log all requests and messages between the user and the domain reasoner, but some 
actions are not yet logged at this moment. Eor example, LogEx offers the possibility to undo 
some steps in a proof, but the use of the undo button is not logged. We can only indirectly 
assume that a student removed part of her proof from the fact that the old formula in a 
rewriting is not equal to the new formula. Without knowing if and where students use 
the undo button, it is very hard to draw conclusions about the effectiveness of the student 
solutions. 

To draw conclusions about the learning of the students during the use of the tool, it is 
necessary that the order in which students make the exercises is fixed. 

The instruction about the use of commutativity was not clear. LogEx admits commut¬ 
ative variants of the standard equivalences. Eor example, the rewriting of |phi / (psi chi)| 
in I (phi / psi) (phi / chi)| is in the list of standard equivalences, and LogEx also allows 
the variant where |(psi chi)/ phi| is rewritten in |(psi / phi) (chi / phi)|. However, LogEx 
considers the rewriting of |(psi chi)/ phi| in |(phi / psi) (phi / chi)| to be a combination 
of distributivity and commutativity, which cannot be performed in one step. Students did 
perform these kind of steps without realizing why LogEx did not accept the step. 
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Conclusion 


The pilot indicates that with some adaptations, especially in feedback on the strategic level, 
LogEx can be a helpful LE for students who practice rewriting logical formulae. The large 
scale evaluation later this year will have to confirm these findings. The pilot was useful 
for the design of the large scale evaluation, in particular with respect to the timing of the 
components, the instruction, and the loggings. 
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